Automatic generation of synthesis units for trainable text-to-speech systems
نویسندگان
چکیده
Whistler Text-to-Speech engine was designed so that we can automatically construct the model parameters from training data. This paper will describe in detail the design issues of constructing the synthesis unit inventory automatically from speech databases. The automatic process includes (1) determining the scaleable synthesis unit which can reflect spectral variations of different allophones; (2) segmenting the recording sentences into phonetic segments; (3) select good instances for each synthesis unit to generate best synthesis sentence during run time. These processes are all derived through the use of probabilistic learning methods which are aimed at the same optimization criteria. Through this automatic unit generation, Whistler can automatically produce synthetic speech that sounds very natural and resembles the acoustic characteristics of the original speaker.
منابع مشابه
Identification and automatic generation of prosodic contours for a text-to-speech synthesis system in French
This paper presents the realisation of an automatically trainable computational prosodic model for French Textto-Speech Synthesis. The methodology proposes the construction of the model in two steps. The first step consists in predicting fundamental frequency contours and duration of syllables from abstract prosodic markers using neural networks [17,12]. In this step, the abstract prosodic mark...
متن کاملGeneration of Unit Databases for the Upc Text to Speech System
This paper describes a method for the generation of unit databases for concatenative text-to-speech systems. The method comprises the automatic segmentation and pitch synchronous labeling of the units and a selection procedure to extract the best instance per unit from a generic speech corpus. The segmentation is performed by an automatic HMM alignment. The introduction of the demiphone improve...
متن کاملWhistler: a trainable text-to-speech system
We introduce Whistler, a trainable Text-to-Speech (TTS) system, that automatically learns the model parameters from a corpus. Both prosody parameters and concatenative speech units are derived through the use of probabilistic learning methods that have been successfully used for speech recognition. Whistler can produce synthetic speech that sounds very natural and resembles the acoustic and pro...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملTrainable speech synthesis with trended hidden Markov models
In this paper we present a trainable speech synthesis system that uses the trended Hidden Markov Model to generate the trajectories of spectral features of synthesis units. The synthesis units are trained from a transcribed continuous speech corpus, making the speech more natural than that produced by conventional diphone synthesisers which are generally trained from a highly articulated speech...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998